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Abstract 

We  propose  a  method  for  choosing  the  number  of  colors,  or  true  gray  levels,  in  an  image. 
This  is  motivated  by  medical  and  satellite  image  segmentation,  and  may  also  be  useful  for 
color  and  gray  scale  image  quantization,  the  display  and  storage  of  computer-generated  holo¬ 
grams,  and  the  use  of  cooccurrence  matrices  for  assessing  texture  in  images.  Our  underlying 
probability  model  is  a  hidden  Markov  random  field.  Each  number  of  colors  considered  is 
viewed  as  corresponding  to  a  statistical  model  for  the  image,  and  the  resulting  models  are 
compared  via  approximate  Bayes  factors.  The  Bayes  factors  are  approximated  using  BIG, 
where  the  required  maximized  likelihood  is  approximated  by  the  Qian-Titterington  pseudo¬ 
likelihood.  We  call  the  resulting  criterion  FLIC  (Pseudolikelihood  Information  Criterion). 
We  also  discuss  a  simpler  approximation,  MMIC  (Marginal  Mixture  Information  Criterion), 
which  is  based  only  on  the  marginal  distribution  of  pixel  values.  This  turns  out  to  be  use¬ 
ful  for  initialization,  and  also  to  have  moderately  good,  albeit  suboptimal,  performance  in 
its  own  right.  We  apply  PLIC  to  three  examples:  a  simulated  two-band  image,  a  medical 
segmentation  problem,  and  a  satellite  image,  and  in  each  case  it  gives  good  results  in  practice. 

Keywords:  BIG;  Color  image  quantization;  Cooccurrence  matrix;  Hologram;  ICM  algorithm; 
Image  segmentation;  Markov  Random  Field;  Medical  image;  Mixture  model;  Posterior  model 
probability;  Pseudolikelihood;  Satellite  image. 
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1  Introduction 


In  this  paper,  we  consider  the  problem  of  determining  the  number  of  colors  or  gray  levels 
to  be  used  in  presenting  or  interpreting  an  image.  We  are  motivated  primarily  by  problems 
in  medical  image  segmentation.  There,  a  segmentation  is  often  desired  to  delineate  organs, 
tumors  or  other  features  in  pixelized  X-ray,  MRI,  PET,  CAT  or  other  images.  If  the  image  is 
segmented  into  too  many  colors,  it  may  make  the  border  finding  problem  harder,  while  too 
few  colors  may  result  in  border  information  being  lost.  The  idea  is  thus  to  find  the  number 
of  colors  needed  to  represent  the  information  in  the  image  (e.g.  Umbaugh  et  al.  (1993); 
Hance  et  al.  (1996)).  Using  no  more  than  the  number  of  gray  levels  needed  facilitates  the 
presentation  and  analysis  of  mammograms,  for  example  Byng  et  al.  (1997). 

A  second  area  of  application  of  this  problem  is  color  image  quantization,  the  process 
by  which  an  original  color  image  is  mapped  into  an  output  image  with  a  limited  number  of 
colors,  while  attempting  to  preserve  the  image  quality.  This  arises  because  real-world  images 
typically  come  in  many  colors,  whereas  output  devices  can  often  display  far  fewer.  In  some 
cases,  the  number  of  colors  used  in  an  image  can  be  reduced  by  a  factor  of  1000  or  more 
without  much  decrease  in  quality  Dixit  (1991).  Cluster  analysis  has  been  extensively  used  to 
find  the  best  color  mapping  (Dixit  (1991);  Xiang  and  Joy  (1994);  Xiang  (1997);  Scheunders 
(1997b);  Scheunders  (1997a);  Papamarkos  (1999)).  Clearly,  the  fewer  colors  that  can  be  used 
without  degrading  the  image,  the  better  for  many  purposes.  However,  the  number  of  colors 
used  has  in  general  been  subjectively  defined  by  the  user. 

A  third  area  of  application  is  the  display  and  storage  of  computer-generated  holograms. 
Bokor  and  Papp  (1996)  investigated  the  effect  of  the  number  of  gray  levels  used  on  image 
fidelity.  They  found,  rather  surprisingly,  that  image  fidelity  was  maximized  in  their  experi¬ 
ments  when  binary  holograms  (with  2  gray  levels)  were  used.  Burr  et  al.  (1998)  showed  that 
5  gray  levels  is  optimal  for  hologram  storage  in  the  absence  of  signal-dependent  noise,  but 
that  because  of  the  presence  of  signal-dependent  noise,  3  gray  levels  will  most  likely  be  the 
optimal  choice  for  a  practical  system.  These  results  suggest  that  the  use  of  small  numbers  of 
gray  levels  is  advantageous,  and  that  the  appropriate  number  may  depend  on  the  situation, 
pointing  to  the  need  for  a  data-based  way  of  choosing  the  number. 

A  fourth  area  of  application  arises  in  the  use  of  cooccurrence  matrices  for  assessing  texture 
in  images  (Haralick  et  al.  (1973);  Haralick  (1979)).  The  cooccurrence  matrix  of  an  image  at 
distance  d  =  (di,  ^2)  is  the  empirical  discrete  joint  distribution  of  the  pixel  value  at  location 
{xi,X2)  and  the  value  at  location  [xi  +  di,X2  +  0^2),  after  the  observed  pixel  values  have 
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been  grouped  into  G  bins  or  “colors”.  Valckx  and  Thijssen  (1997)  considered  the  effect  of 
the  number  of  bins  on  the  effectiveness  of  cooccurrence  matrices  in  characterizing  textures 
in  echograms.  They  concluded  that  in  gray-scale  echograms  with  256  gray  levels,  far  fewer 
were  needed  to  differentiate  textures;  they  found  64  bins  to  be  enough. 

There  thus  seems  to  be  a  widespread  feeling  in  the  literature  that  image  analysis  can 
often  gain  by  using  a  relatively  small  number  of  colors  or  “true  gray  levels” .  Also,  it  often 
seems  that  the  underlying  subject  matter  theory  or  knowledge  is  not  enough  to  completely 
determine  the  number  of  colors  to  be  used,  even  though  it  may  often  suggest  a  reasonable 
range.  This  points  to  the  need  to  use  the  image  data  themselves  to  help  make  this  deter¬ 
mination.  However,  there  is  a  striking  lack  of  systematic  data-based  approaches  to  deciding 
how  many  colors  should  be  used.  Umbaugh  et  al.  (1992)  used  “automatic  induction”  to 
determine  the  number  of  colors,  but  they  did  not  describe  the  method  in  detail.  Godtlieb- 
sen  and  Chu  (1995)  used  a  method  based  on  kernel  density  estimation  from  the  marginal 
gray-scale  histogram. 

Here  we  introduce  a  new,  model-based,  approach  to  this  problem  that  uses  approximate 
Bayes  factors.  We  model  the  image  in  terms  of  a  Markov  random  field,  and  then  each  number 
of  colors  considered  corresponds  to  a  different  statistical  model  for  the  image  data.  By  doing 
this,  we  recast  the  problem  of  determining  the  number  of  colors  in  the  image  as  a  problem  of 
statistical  model  comparison,  and  for  this  we  use  the  standard  Bayesian  approach  of  Bayes 
factors.  We  propose  an  approximation  to  the  Bayes  factor  based  on  the  pseudolikelihood, 
called  FLIC  (for  Pseudolikehood  Information  Criterion). 

We  also  discuss  a  simpler  approach  based  on  the  marginal  distribution  of  pixel  values. 
We  use  the  fact  that  this  is  a  finite  mixture  distribution,  and  we  use  approximate  Bayes 
factors  to  choose  the  number  of  colors.  This  is  based  on  the  idea  of  marginal  mixture 
EM  segmentation  Stanford  (1999),  and  we  call  it  MMIC  (for  Marginal  Mixture  Information 
Criterion).  This  simpler  approach  is  useful  for  initializing  our  preferred  PLIC  method,  and 
also  works  moderately  well  in  its  own  right.  The  big  difference  between  the  two  methods  is 
that  MMIC  takes  no  account  of  spatial  structure,  whereas  PLIC  takes  account  of  it  explicitly. 

In  Section  2,  we  review  the  Bayesian  image  models  on  which  our  work  is  based.  In 
Section  3,  we  review  the  basic  ideas  behind  Bayes  factors  and  give  the  marginal  mixture  EM 
segmentation  method.  We  then  introduce  the  PLIC  approximation.  In  Section  4,  we  give 
three  examples.  The  first  is  a  simulated  two-color  example,  and  there  PLIC  gives  a  good 
answer,  whereas  MMIC  is  less  satisfactory;  this  illustrates  the  value  of  taking  account  of  the 
spatial  structure.  We  then  consider  a  medical  image  segmentation  problem,  and  finally  a 
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satellite  image.  In  both  of  these  latter  examples,  PLIC  gives  good  answers. 


2  Bayesian  Image  Modeling 

2.1  Markov  Random  Fields  With  Noise 

The  model  that  underlies  our  work  is  a  standard  one  in  Bayesian  image  analysis:  a  Markov 
random  field  with  observation  noise.  We  denote  the  value  observed  at  pixel  i  by  T*,  which 
will  be  a  scalar  for  gray-scale  images,  and  a  vector  for  color  or  multispectral  ones.  For 
each  observed  1^,  there  is  a  corresponding  discrete- valued  unobservable  state,  Wj,  which 
determines  the  distribution  of  T*;  the  set  of  all  the  Xi  values  is  called  the  “true  scene” .  Each 
possible  state  for  Xi  corresponds  to  a  particular  distribution  of  T*.  The  T*  are  assumed  to  be 
conditionally  independent  given  the  Wj,  and  so  dependence  among  the  variables  occurs 
only  through  dependence  among  the  Xi  values. 

We  impose  a  dependence  structure  on  the  Wj,  using  a  Markov  random  field  to  model  the 
true  state  of  each  pixel.  This  is  a  hidden  Markov  random  field  model  because  it  is  observable 
only  through  the  values.  Suppose  that  there  are  K  possible  states,  so  that  Xi  is  an  integer 
between  1  and  K.  We  define  I{Xi,Xj)  as  an  indicator  function  equal  to  1  when  Xi  =  Xj 
and  to  zero  otherwise.  We  let  N{Xi)  be  the  set  of  neighbors  of  Xi]  here  we  will  take  these 
to  be  the  8  pixels  adjacent  to  pixel  Xi.  We  let  U{N{Xi),  k)  denote  the  number  of  points  in 
N{Xi)  which  have  state  k,  so  that  U{N{Xi),Xi)  is  the  number  of  neighbors  of  pixel  i  which 
have  the  same  state  as  pixel  i.  For  our  work  here  we  use  the  Potts  model,  defined  as  follows: 


p{X)  (xexp{(f)Y,I{Xi,Xj)), 


(1) 


where  the  sum  is  over  all  neighbor  pairs,  i  ^  j.  This  leads  to  the  following  conditional 
distribution: 


p{Xi  =  m\N{Xi),<t>)  = 


exp(^[/(iV(Wj),m)) 


(2) 


Y.k^xp{(t>U{N{Xilk))- 
The  parameter  (p  expresses  the  amount  of  spatial  homogeneity  in  the  model.  A  positive 
value  of  (p  means  that  neighboring  pixels  tend  to  be  similar,  while  a  negative  value  would 
mean  that  neighboring  pixels  tend  to  be  dissimilar.  If  ^  =  0,  then  the  pixels  are  independent. 
Note  that  pixels  on  the  boundary  of  an  image  will  not  have  a  full  set  of  observed  neighbors. 
For  simplicity  and  because  the  boundary  is  only  a  small  fraction  of  the  data,  we  exclude 
boundary  pixels  from  the  analysis  except  in  their  use  as  neighbors  of  interior  pixels. 
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2.2  Parameter  Estimation 

The  Iterated  Conditional  Modes  (ICM)  algorthm  was  introduced  by  Besag  (1986).  It  can 
be  used  as  a  method  of  image  reconstruction  when  local  characteristics  of  the  true  image 
can  be  modeled  as  a  Markov  random  field.  In  particular,  this  can  be  used  with  the  model 
described  in  (1).  The  algorithm  begins  with  an  initial  estimate  of  the  true  scene  X,  and 
proceeds  iteratively  to  provide  an  estimate  of  the  parameters  of  the  conditional  distribution 
of  Yi  given  Xi,  as  well  as  (p  and  X. 

An  initial  estimate  of  X,  which  is  required  for  ICM,  can  be  found  through  the  simple 
marginal  mixture  EM  segmentation  method  Stanford  (1999).  This  is  based  on  the  fact  that 
if,  for  example,  the  conditional  distribution  of  T*  given  X^  is  Gaussian,  then  the  marginal 
distribution  of  the  observed  pixel  values,  T*,  is  a  finite  mixture  of  Gaussians  with  K  com¬ 
ponents.  The  EM  algorithm  Dempster  et  al.  (1977)  can  be  used  to  fit  a  Gaussian  mixture 
model  to  this  distribution,  and  a  maximum  likelihood  classification  of  each  of  the  pixels  into 
one  of  the  K  components  yields  an  initial  estimate  of  X. 

3  Approximate  Bayes  Factors  for  Choosing  the  Num¬ 
ber  of  Colors  or  True  Gray  Levels:  PLIC  and  MMIC 

Our  general  approach  to  the  problem  of  choosing  the  number  of  colors  or  true  gray  levels 
in  an  image,  AT,  is  to  recast  it  as  a  statistical  model  selection  problem,  and  then  to  use  the 
standard  Bayes  factor  approach  to  choose  the  appropriate  model.  The  Markov  random  field 
model  in  Section  2  is  viewed  as  defining  not  one  model,  but  several,  one  for  each  value  of 
K  considered.  In  Section  3.1  we  review  the  basic  ideas  of  Bayes  factors,  in  Section  3.2  we 
introduce  our  pseudolikelihood-based  approximation,  PLIC,  and  in  Section  3.3  we  discuss 
the  simpler  marginal  mixture  approximation,  MMIC. 

3.1  Bayes  Factors 

The  Bayesian  approach  to  model  comparison  and  model  selection  is  based  on  posterior  model 
probabilities.  Given  a  set  of  models  considered,  {M^  '■  K  =  1, . . . ,  ATmax}  and  data  D,  the 
posterior  probability  of  model  Mk  is 

^  p{D\Mk)p{Mk) 
i:f=T  p{D\ML)p{MLy 
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where  p{D\Mk)  is  the  integrated  likelihood  of  model  and  p{Mk)  is  the  prior  probability 
of  model  Mk-  Here  we  will  take  the  models  to  be  equally  likely  a  priori,  so  that  p{Mk)  = 

l/-^max  {K  =1,.  .  .  ,  i^max)- 

The  integrated  likelihood,  p{D\Mk)-,  is  defined  by 

p{d\Mk)  =  I  p{D\eK,  MK)p{eK)deK,  (4) 

where  9k  is  the  parameter  (vector)  for  model  Mk,  p{D\9k,  Mk)  is  the  (usual)  likelihood, 
and  p{9k)  is  the  prior  distribution.  The  quantity 

p(D\Mk) 
p(D\Mr) 

is  known  as  the  Bayes  /actor  for  model  Mk  against  model  M^.  See  Kass  and  Raftery  (1995), 
Raftery  (1995),  and  Hoeting  et  al.  (2000)  for  reviews  of  Bayesian  model  selection  and  Bayes 
factors. 

Evaluating  the  integral  in  (4)  is  often  hard,  and  much  of  the  research  in  this  area  has 
focused  on  ways  of  doing  it.  A  simple,  but  often  reasonably  good  approximation  is 

2\ogp{D\MK)  ^  BIC  (6) 

where 

BIC  =  2\ogp{D\eK,MK)  -  iVlog(dim(0K)),  (7) 

with  9k  being  the  maximum  likelihood  estimator  of  9k-  Under  regularity  conditions  that  are 
roughly  those  that  guarantee  consistency  and  asymptotic  normality  of  9k,  the  error  in  this 
approximation  is  0(1)  regardless  of  the  prior  p{9k\Mk)  (Cox  and  Hinkley  (1978);  Schwarz 
(1978)).  If,  in  addition,  the  prior  p{9k\Mk)  is  a  unit  information  prior  (i.e.  a  multivariate 
normal  prior  distribution  centered  at  the  MLE  with  variance  matrix  equal  to  the  inverse 
of  the  Fisher  information  matrix  for  one  observation),  then  the  error  is  0{N~^l‘^)  Kass  and 
Wasserman  (1995).  Raftery  (1999)  argues  that  this  prior,  while  proper,  is  conservative, 
and  so  may  be  appropriate  as  the  basis  for  a  baseline  reference  analysis.  Differences  of  10 
or  more  in  BIC  values  between  competing  models  are  conventionally  viewed  as  consituting 
strong  evidence  for  the  favored  model  over  its  competitor  Kass  and  Raftery  (1995). 

3.2  Penalized  Pseudolikelihood  Criterion:  PLIC 

We  wish  to  conduct  inference  for  K,  the  number  of  segments  in  the  image,  using  the  Bayesian 
approach  in  Section  3.1.  Here  Mk  refers  to  the  Potts  model  (1)  with  K  components.  Thus 
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6k  consists  of  (f)  and  the  parameters  of  the  conditional  distribution  of  F*  given  Xj.  An  initial 
thought  would  be  to  use  BIC^  as  defined  by  (7).  A  proposal  along  these  lines  was  made  by 
Ji  and  Seymour  (1996)  for  choosing  between  different  Markov  random  field  models  in  the 
case  where  the  true  scene  is  directly  observed  and  the  number  of  colors  or  true  gray  levels  is 
known  in  advance.  Neither  of  these  is  the  case  in  the  applications  we  have  in  mind,  and  Ji 
and  Seymour  (1996)  did  not  consider  the  choice  of  the  number  of  segments.  When,  as  here, 
the  true  scene  is  not  observed,  this  would  require  evaluation  of  the  likelihood  of  the  observed 
data,  L{Y\K),  namely 

L{Y\K)  =  Y.p{Y\X  =  X,  K)p{X  =  x\K).  (8) 

X 

The  sum  in  (8)  involves  all  possible  configurations  of  the  hidden  states.  With  N  pixels  and  K 
states,  there  are  possible  configurations,  which  is  huge,  making  this  approach  intractable. 
Instead,  we  approximate  the  likelihood  term  by  a  pseudolikelihood  proposed  by  Qian  and 
Titterington  (1991a)  and  Qian  and  Titterington  (1991b),  which  maintains  computational 
feasibility. 

The  basic  idea  of  this  pseudolikelihood  is  that  instead  of  summing  over  all  possible 
configurations  of  W,  we  consider  only  configurations  that  are  close  to  the  ICM  estimate  of 
W,  denoted  by  X.  Specifically,  we  consider  each  pixel  T*  in  turn  and  condition  on  W_j,  which 
is  X  excluding  the  value  at  Wj.  We  then  obtain  the  following  conditional  likelihood,  in  which 
N{Xi)  denotes  the  neighbors  of  Xp 

L{Yi\X_i,K)  =  j^p{Yi\Xi  =  j)p{Xi  =  j\N{Xi)).  (9) 

1=1 

The  first  term  in  the  sum,  p(l^|Wj  =  j),  simply  requires  evaluation  of  the  conditional 
density  of  T*  given  Xp  the  second  term,  p{Xi  =  j\N{Xi))  is  evaluated  using  (2).  The 
conditional  likelihoods  from  (9)  are  combined  to  form  the  Qian-Titterington  pseudolikelihood 
of  the  image,  as  follows: 


Lx<.Y\K)  =  Y[f{Yi\X_a)  (10) 

i 

= (11) 

i  j=l 

We  replace  the  intractable  L{Y\K)  by  the  easily  computable  Lj^{Y\K)  from  (11),  to 
obtain  our  new  approximation 

PLIC{K)  =  2\og{Lj^{Y\K))  -  DK\og{N).  (12) 
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Ideally,  one  could  compute  PLIC{K)  for  a  large  range  of  K  values  and  then  choose  K  to 
maximize  PLIC{K).  However,  we  do  not  expect  the  model  assumptions  to  hold  for  values 
of  K  very  far  from  the  true  value.  Because  of  this,  we  adopt  a  sequential  approach.  We 
begin  by  computing  PLIC{K)  for  =  1,  and  then  incrementally  increase  the  value  of  K. 
At  each  step,  we  compare  PLIC{K)  with  PLIC{K  —  1),  and  stop  the  process  when  the 
smaller  model  is  preferred.  In  other  words,  as  we  increase  K  incrementally  from  AT  =  1,  we 
take  the  first  local  maximum  of  PLIC{K)  to  be  our  choice  for  the  number  of  segments  K. 

3.3  MMIC:  A  Simpler  Bayes  Factor  Approximation  for  Initializa¬ 
tion  and  Fast  Computation 

A  faster  approximate  Bayes  factor  approach  to  choosing  the  number  of  colors  or  true  gray 
levels  is  available  by  just  considering  the  empirical  marginal  distribution  of  pixel  values 
and  ignoring  their  spatial  locations.  This  is  much  faster  computationally  than  the  FLIC 
method.  It  turns  out  to  give  results  that  are  often  reasonable  at  least  for  initializing  the 
more  demanding  FLIC  method,  although  it  has  the  clear  disadvantage  relative  to  FLIC  that 
it  does  not  take  account  of  salient  spatial  information. 

If  the  data  are  generated  by  the  Markov  random  field  model  (1),  then  the  marginal 
distribution  of  pixels  is  a  finite  mixture  of  K  distributions,  each  equal  to  the  conditional 
distribution  of  given  that  Xi  takes  on  one  of  its  K  possible  values.  The  basic  idea  of 
MMIC  is  to  compute  the  BIC  value  for  each  number  of  components  in  this  finite  mixture 
distribution. 

Suppose  we  have  observations  Y  =  (Yi,  ...,Yn)  from  a  mixture  model  with  K  components. 
Let  Pj  denote  the  mixture  proportion  of  the  jth  component.  Let  $  be  the  mixing  density  (so 
that,  for  example,  for  a  Gaussian  mixture,  $  is  a  single  Gaussian  density)  with  9  =  {9i, 9k) 
giving  the  parameters  for  the  K  components.  The  marginal  density  for  a  single  observation, 
Fj,  is 


f(YAK,e)  =  Y_Pj'i(YAe,).  (i3) 

t=l 

Then  the  loglikelihood  for  Y,  assuming  that  all  of  the  Yj  are  independent,  is 

iogp(r|A',e)  =  (14) 

i=l  j=l 

After  choosing  9  to  maximize  the  loglikelihood  from  (14),  the  maximized  loglikelihood  can 
be  used  in  the  BIC  formula  (7).  We  refer  to  the  resulting  approximation  as  the  Marginal 
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Mixture  Information  Criterion  (MMIC): 

MMIC{K)  =  2\ogpMLE{Y\K)  -  dK\og{N),  (15) 

where  (Ik  is  the  number  of  parameters  in  the  mixture  model  with  K  components. 

We  view  MMIC  as  a  heuristic  guideline  rather  than  a  rigorously  justified  approximation, 
because  the  likelihood  on  which  it  is  based  is  correct  if  the  F*  are  independent,  which  is 
not  the  case,  and  also  because  the  usual  regularity  conditions  for  BIC  to  approximate  the 
Bayes  factor  do  not  hold  for  mixture  models.  However,  it  is  known  that  choosing  a  model 
based  on  BIC  produces  consistent  density  estimates  when  the  data  are  independent  and  one¬ 
dimensional  Boeder  and  Wasserman  (1997),  and  a  proof  of  the  consistency  of  BIC  as  a  model 
selection  criterion  for  mixture  models  has  been  produced  Keribin  (1998).  In  addition,  BIC 
has  given  good  results  for  choosing  the  number  of  components  in  a  wide  range  of  applications 
of  mixture  models  (Boeder  and  Wasserman  (1997);  Dasgupta  and  Baftery  (1998);  Fraley  and 
Baftery  (1998);  Stanford  (1999);  Stanford  and  Baftery  (2000)). 

4  Examples 

4.1  Simulated  Two-Segment  Image 

The  simulated  two-segment  image  shown  in  Figure  1  is  made  up  of  two  solid  bands,  with 
mean  greyscale  values  of  120  and  140.  Independent  Gaussian  noise  with  mean  0  and  standard 
deviation  10  was  added  to  each  pixel,  and  then  the  values  were  rounded  to  integers. 

The  two  bands  are  close  enough  together  on  average  that  the  marginal  information  alone 
makes  it  hard  to  see  that  there  are  actually  two  segments  here.  The  marginal  histogram  of 
the  pixels  in  the  simulated  image  is  shown  in  Figure  2,  and  this  is  quite  unimodal.  Visually 
it  is  quite  clear  from  the  image.  Figure  1,  that  there  are  two  bands,  but  this  is  due  to  the 
spatial  arrangement  of  the  pixels;  this  is  a  case  where  there  should  be  advantage  to  taking 
account  of  the  spatial  structure. 

Table  1  shows  the  FLIC  and  MMIC  values  for  this  image.  FLIC,  which  takes  account  of 
the  spatial  information,  correctly  and  quite  decisively  determines  that  there  are  two  segments, 
while  MMIC  identifies  only  one  segment  (although  the  difference  of  about  3  between  the 
MMIC  values  for  one  and  two  segments  would  be  viewed  as  indecisive). 

Compare  this  to  the  scrambled  image  in  Figure  3,  which  is  a  random  reordering  of  the 
pixels  from  Figure  1.  Since  these  two  images  contain  exactly  the  same  pixels  (just  in  a 
different  order),  their  marginal  information  will  be  the  same.  Thus  the  histogram  of  the 
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Table  1:  FLIC  and  MMIC  Results  for  the  Simulated  Two-Segment  Image  and  the  Scrambled 
Image.  The  number  of  segments  preferred  by  a  method  for  the  given  image  is  shown  in 
boldface. 


Two-Segment  Image 

Scrambled  Image 

Segments 

4> 

FLIC 

MMIC 

4> 

FLIC 

MMIC 

1 

- 

-11731.28 

-12998.79 

- 

-11756.49 

-12998.79 

2 

2.06 

-10784.69 

-13002.17 

0.05 

-11833.84 

-13002.17 

3 

0.45 

-10948.62 

-13021.64 

0.19 

-11868.36 

-13021.64 

values  in  the  scrambled  image,  Figure  3,  is  the  same  as  for  the  original  image  in  Figure  1, 
and  so  is  also  shown  in  Figure  2.  Table  1  shows  that  both  FLIC  and  MMIC  choose  one 
segment  for  the  scrambled  image  -  FLIC  quite  decisively  and  MMIC  just  barely.  From  most 
points  of  view,  this  is  the  correct  answer  in  this  case.  Again,  FLIC  seems  to  perform  better, 
because  it  chooses  the  right  answer  decisively,  whereas  MMIC  is  uncertain  between  the  two 
answers. 

The  estimated  (p  values  used  in  computing  FLIC  are  also  shown  in  the  table.  The  large 
(p  for  the  two  segment  fit  of  the  two-segment  image  indicates  that  a  large  degree  of  spatial 
association  is  found;  this  results  in  a  much  higher  value  of  FLIC  than  is  obtained  for  either 
one  or  three  segments. 

4.2  Medical  Image  Segmentation:  Dog  Lung 

Figure  4  shows  a  FET  image  of  a  dog  lung.  This  image  was  obtained  from  Dr.  H.  T. 
Robertson  at  the  University  of  Washington  Division  of  Fulmonary  and  Critical  Care.  The 
goal  of  the  analysis  here  is  to  delineate  the  lung  in  the  image  automatically. 

It  is  clear  from  Figure  4  that  the  actual  image  area  is  circular,  with  the  corners  of  the 
image  filled  in  with  a  constant  grey  value;  this  reflects  the  way  the  FET  image  is  taken.  This 
sort  of  artifact  can  be  removed  quite  easily  with  a  mixture  model;  one  of  the  components 
converges  to  a  spike  for  that  grey  level,  effectively  separating  it  from  the  rest  of  the  data. 
This  can  be  clearly  seen  in  the  marginal  histogram  of  the  image  shown  in  Figure  5. 

Table  2  shows  that  FLIC  quite  decisively  chooses  four  segments  for  this  image.  In  the 
context  of  FET  imagery,  the  choice  of  four  segments  is  quite  reasonable  for  this  image.  Two 
segments  are  needed  for  the  background:  one  to  model  the  spike  (due  to  the  corner  artifact) 
and  one  for  the  general  background.  Since  the  image  is  constructed  based  on  radioactive 
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Figure  5:  Marginal  Histogram  of  the  Dog  Lung  Image. 


emissions  from  gas  in  the  lung,  it  is  not  at  all  surprising  to  see  two  segments  for  the  lung 
itself  to  account  for  the  high  gas  density  in  the  interior  of  the  lung  and  the  somewhat  lower 
gas  density  around  the  periphery.  For  this  case,  MMIC  also  chooses  4  segments  using  only 
the  marginal  greyscale  values  from  the  image. 

The  results  of  the  marginal  mixture  EM  segmentation  are  shown  in  Figure  6.  This  does 
a  reasonable  job  of  separating  the  lung  from  the  background,  especially  given  that  it  makes 
no  use  of  spatial  context.  It  provides  a  good  initialization  for  the  segmentation  methods  that 
do  take  account  of  spatial  context. 


Table  2:  FLIC  and  MMIC  Results  for  the  Dog  Lung  Image.  The  preferred  number  of 
segments  is  shown  in  boldface  for  each  method. 


Segments 

<P 

Logpseudolikelihood 

FLIC 

MMIC 

1 

- 

-80641.02 

-161311.13 

-166007.1 

2 

1.14 

-61081.56 

-122221.33 

-137193.0 

3 

1.20 

-59542.47 

-119172.25 

-137019.8 

4 

1.67 

-50961.33 

-102039.06 

-128746.2 

5 

1.07 

-52444.73 

-105034.97 

-135323.5 

6 

1.11 

-52090.24 

-104355.08 

-135333.6 
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Figure  6:  Marginal  Mixture  EM  Segmentation  of  the  Dog  Lung  Image  into  Four  Segments. 

The  ICM  refinement,  shown  in  Figure  7,  does  a  good  job  of  reducing  clutter  in  the  image, 
and  gives  a  qualitatively  satisfactory  answer.  We  also  applied  morphological  smoothing 
Matheron  (1975);  Serra  (1982)  to  Figure  7;  the  final  result  is  shown  in  Figure  8.  This  gives 
additional  smoothing  to  the  outer  edge  of  the  lung,  and  it  also  removes  most  of  the  other 
clutter  in  the  image.  The  small  spot  separate  from  the  lung  and  below  could  be  easily 
removed  by  simply  considering  the  lung  to  be  the  largest  connected  component.  The  small 
void  in  the  center  of  the  lung  is  not  artifactual;  it  is  real. 

We  shared  our  results  with  the  researchers  who  provided  us  with  this  image  and  asked  for 
their  subjective  evaluation  of  the  segmented  image.  They  found  that  our  final  segmentation 
provided  a  good  delineation  of  the  lung  as  a  region  of  interest  for  further  analysis  and  felt 
that  the  segmentation  based  on  four  segments  was  more  useful  than  those  based  on  three  or 
five  segments,  thus  providing  some  clinical  validation  of  our  method  in  this  case. 

The  final  result  shows  that  this  segmentation  method  has  promise  for  the  purpose  of 
automatically  processing  a  database  of  lung  images  to  separate  out  the  region  of  interest 
(the  lung)  from  the  background.  Currently,  the  most  widely  used  method  for  this  sort  of 
segmentation  is  for  a  human  expert  to  manually  outline  the  lung  with  an  interactive  computer 
program,  a  process  which  is  quite  tedious  and  can  take  a  long  time  for  a  large  database  of 
images.  The  automatic  segmentation  algorithm  can  obviate  the  need  for  the  manual  process. 
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Figure  7:  Segmentation  of  the  Dog  Lung  Image  into  4  Segments  by  the  ICM  Algorithm.  The 
algorithm  was  initialized  by  the  marginal  mixture  EM  segmentation  result. 


Figure  8:  Final  Segmentation  of  the  Dog  Lung  Image  into  4  Segments  After  Morphological 
Smoothing  (opening  and  closing,  conditional  on  the  edge  pixels). 
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Figure  9:  Satellite  Image  of  Washington  Coast 
requiring  only  human  inspection  of  the  results. 

4.3  Satellite  Image  Segmentation:  Washington  Coast 

Figure  9  is  a  satellite  image  of  a  section  of  the  Pacific  coast  of  Washington  state’s  Olympic 
peninsula;  this  image,  provided  by  the  US  Geological  Service  as  part  of  the  National  Aerial 
Photography  Program  (NAPP),  was  obtained  from  the  Terraserver  web  page,  terraserver.microsoft.com. 
The  resolution  is  8  meters  per  pixel,  and  it  is  a  256-level  greyscale  image. 

Table  3  shows  that  the  maximum  value  of  PLIC  occurs  at  iP  =  6  segments,  so  we  regard 
this  as  the  optimal  choice  of  K  for  this  model.  For  comparison,  MMIC  values  are  also  shown; 
the  maximum  occurs  at  4  segments.  The  marginal  histogram  of  this  image  is  shown  in  Figure 
10;  the  large  spike  is  due  to  the  small  variance  in  the  greyscale  value  of  the  water. 

For  the  7  segment  case,  the  EM  algorithm  led  to  a  solution  with  only  6  segments.  This 
causes  convergence  problems  and  numerical  instability;  if  one  of  the  mixture  proportions 
approaches  zero,  the  meaning  of  the  corresponding  estimates  of  mean  and  variance  becomes 
unclear.  These  problems  make  the  results  for  the  7  segment  case  somewhat  unreliable;  we 
regard  this  simply  as  evidence  that  the  7-segment  model  does  not  fit  well. 

The  6-segment  results  of  the  marginal  mixture  EM  segmentation  are  shown  in  Figure 
11.  The  water  is  quite  well  classified  as  a  single  segment,  while  the  land  is  mostly  contained 
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Table  3:  FLIC  and  MMIC  Results  for  the  Washington  Coast  Image.  The  7  segment  case, 
noted  with  f,  suffered  convergence  problems  because  the  EM  algorithm  led  to  a  solution 
with  only  6  segments;  this  is  discussed  in  the  text.  The  favored  result  from  each  method  is 
shown  in  boldface. 


Segments 

(f> 

Logpseudolikelihood 

FLIC 

MMIC 

1 

- 

-64521.78 

-129072.20 

-133475.0 

2 

1.23 

-56721.70 

-113500.70 

-125273.0 

3 

0.99 

-45975.06 

-92036.07 

-117607.8 

4 

0.99 

-45514.45 

-91143.49 

-117528.6 

5 

0.99 

-42993.06 

-86129.36 

-117564.9 

6 

1.00 

-42763.60 

-85699.10 

-117537.5 

7 

0.94t 

-71772.181 

-143744.891 

-123280.5t 

Figure  11:  Marginal  Mixture  EM  Segmentation  of  the  Washington  Coast  Image  Into  Six 
Segments. 

in  two  segments.  The  tideline  accounts  for  most  of  the  variability  in  the  image.  Figure  12 
gives  the  ICM  refinement  of  the  segmentation;  we  can  see  that  much  of  the  noise  in  the  land 
interior  has  been  smoothed  out,  while  the  tideline  is  still  quite  clear. 

In  our  final  ICM  segmentation  (Figure  12),  the  water  is  well-characterized  by  the  darkest 
segment.  The  second-darkest  segment  corresponds  to  shallow  tidewater  near  the  bright 
beach,  as  well  as  combining  with  the  third-darkest  segment  to  characterize  most  of  the  dry 
land.  Although  the  second-darkest  segment  comprises  both  dry  land  and  shallow  tidewater, 
these  two  land  types  are  spatially  separated  by  the  beach.  The  two  brightest  segments 
correspond  to  the  beach,  which  is  quite  reflective.  The  third-brightest  segment  is  transitional; 
it  fills  in  between  dry  land  and  water  in  regions  where  the  beach  is  not  evident,  as  well  as 
capturing  the  small  hill  in  the  upper  right  hand  corner  of  the  image. 

5  Discussion 

We  have  proposed  an  approximate  Bayesian  method  for  determining  the  number  of  colors  or 
gray  levels  in  a  noisy  image  from  the  image  data  themselves.  This  number  has  usually  been 
either  predetermined  by  the  user  or  chosen  in  an  ad  hoc  manner,  but  the  literature  shows 
that  there  are  many  instances  where  a  more  formal,  data-based,  method  could  be  useful.  Our 
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Figure  12:  Segmentation  of  the  Washington  Coast  Image  into  Six  Segments  Via  ICM. 

method  works  well  in  several  examples,  and  seems  to  be  potentially  useful  in  the  automatic 
segmentation  of  medical  and  satellite  images,  and  perhaps  also  for  color  and  gray-level  image 
quantization,  and  the  operational  use  of  cooccurrence  matrices  for  characterizing  texture. 
Its  use  in  these  latter  contexts  remains  to  be  explored,  however. 

Our  method  is  fully  defined  for  multispectral  and  color  images,  but  we  have  shown  ex¬ 
amples  of  its  use  only  for  gray-scale  images.  For  multispectral  images,  the  noise  distribution 
would  often  be  taken  to  be  multivariate  normal.  For  initialization,  the  marginal  mixture  EM 
segmentation  method  is  still  available,  for  example  using  model-based  clustering  Fraley  and 
Raftery  (1998).  If  the  number  of  pixels  is  too  large  for  this  to  be  efficient,  a  subsample  of  the 
pixels  could  be  used  for  the  clustering,  and  then  discriminant  analysis  applied  to  classify  the 
remaining  pixels  Banfield  and  Raftery  (1993),  or  an  efficient  method  based  on  the  minimal 
spanning  tree  could  be  used  Posse  (2000). 

Our  approach  has  been  to  compute  approximate  Bayes  factors  using  the  ICM  algorithm 
and  the  pseudolikelihood  of  Qian  and  Titterington  (1991a).  Another  approach  would  be  to 
carry  out  a  fully  Bayesian  analysis  of  the  image  model  using  Markov  chain  Monte  Carlo,  as 
suggested  by  Besag  et  al.  (1991),  and  then  to  estimate  the  integrated  likelihood  using  one 
of  the  available  methods  for  doing  this  from  MCMC  (e.g.  Newton  and  Raftery  (1994);  Chib 
(1995);  Raftery  (1996);  DiCiccio  et  al.  (1997);  Oh  (1999)).  This  would  conform  more  fully 
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to  the  Bayesian  paradigm,  but  could  be  extremely  expensive  computationally. 

Our  method  could  also  be  used  for  choosing  between  competing  higher-order  interaction 
Markov  random  field  models  for  texture  in  images,  e.g.  between  special  cases  of  the  Chien 
model  of  Descombes  et  al.  (1995)  and  Descombes  et  al.  (1996)  or  the  models  of  Tjelmeland 
and  Besag  (1998).  In  some  situations  one  might  expect  there  to  be  a  relationship  between 
the  number  of  colors  used  and  the  complexity  of  the  texture  model  needed  to  describe  the 
resulting  pattern.  Our  approach  could  be  generalized  to  choosing  both  simultaneously.  This 
could  be  done  by  identifying  each  combination  of  texture  model  and  number  of  colors  with 
the  corresponding  probability  model,  and  then  comparing  all  the  resulting  probability  models 
using  approximate  Bayes  factors,  calculated  using  methods  based  on  those  described  here. 
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